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Abstract 

This paper presents a novel method for controlling teams 
of unmanned aerial vehicles using Stochastic Optimal Con¬ 
trol (SOC) theory. The approach consists of a centralized 
high-level planner that computes optimal state trajectories as 
velocity sequences, and a platform-specific low-level con¬ 
troller which ensures that these velocity sequences are met. 
The planning task is expressed as a centralized path-integral 
control problem, for which optimal control computation cor¬ 
responds to a probabilistic inference problem that can be 
solved by efficient sampling methods. Through simulation 
we show that our SOC approach (a) has significant benefits 
compared to deterministic control and other SOC methods 
in multimodal problems with noise-dependent optimal solu¬ 
tions, (b) is capable of controlling a large number of platforms 
in real-time, and (c) yields collective emergent behaviour in 
the form of flight formations. Finally, we show that our ap¬ 
proach works for real platforms, by controlling a team of 
three quadrotors in outdoor conditions. 


1 Introduction 

The recent surge in autonomous Unmanned Aerial Vehicle 
(UAV) research has been driven by the ease with which plat¬ 
forms can now be acquired, evolving legislation that reg¬ 
ulates their use, and the broad range of applications en¬ 
abled by both individual platforms and cooperative swarms. 
Example applications include automated delivery systems, 
monitoring and surveillance, target tracking, disaster man¬ 
agement and navigation in areas inaccessible to humans. 

Quadrotors are a natural choice for an experimental plat¬ 
form, as they provide a safe, highly-agile and inexpensive 
means by which to evaluate UAV controllers. Figure[l]shows 
a 3D model of one such quadrotor, the Ascending Technolo¬ 
gies Pelican. Quadrotors have non-linear dynamics and are 
naturally unstable, making control a non-trivial problem. 

Stochastic optimal control (SOC) provides a promising 
theoretical framework for achieving autonomous control of 
quadrotor systems. In contrast to deterministic control, SOC 
directly captures the uncertainty typically present in noisy 
environments and leads to soluti ons that qualitatively de¬ 
pend on the level of uncertainty ( Kappen 20051 . However, 
with the exception of the simple Linear Quadratic Gaussian 
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Figure 1: Control hierarchy: The path-integral controller 
(1) calculates target velocities/heights for each quadrotor. 
These are converted to roll, pitch, throttle and yaw rates by 
a platform-specific Velocity Height PID controller (2). This 
control is in turn passed to the platform’s flight control sys¬ 
tem (3), and converted to relative motor speed changes. 


case, for which a closed form solution exists, solving the 
SOC problem requires solving the Hamilton Jacobi Bellman 
(HJB) equations. These equations are generally intractable, 
and so the SOC problem remains an open challenge. 

In such a complex setting, a hierarchical approach is usu¬ 
ally taken and the control problem is reduced to follow a 
state-trajectory (or a set of way points) designed by hand 
or computed offline using trajectory planning algorithms 
( [Kendoul 2012] ). While the planning step typically involves 
a low-dimensional state representation, the control methods 
use a detailed complex state representation of the UAV Ex¬ 
amples of control methods for trajectory tracking are the 
Proportional Integral Derivative or the Linear-Quadratic reg¬ 
ulator. 










































A generic class of SOC problems was introduced in Kap¬ 
pen Todorov ( [2005 2006 ) for which the controls and the 
cost function are restricted in a way that makes the HJB 
equation linear and therefore more efficiently solvable. This 
class of problems is known as path integral (PI) control, 
linearly-solvable controlled diffusions or Kullback-Leibler 
control, and it has lead to successful robotic applications, 


e.g. ([Kinjo, Uchibe, and Doya 2013[ Rombokas et al. 2012[ 

|Theodorou, Buchli, and Schaal 20101 )7 A particularly inter¬ 
esting feature of this class of problems is that the compu¬ 
tation of optimal control is an inference problem with a so¬ 
lution given in terms of the passive dynamics. In a multi¬ 
agent system, where the agents follow independent passive 
dynamics, such a feature can be exploited using approxi¬ 
mate inference methods such as v ariational approximations 
or belief propagation ( [Kappen, Gomez, and Qpper 2012[ 
Van Den Broek, Wiegerinck, and Kappen 2008). 

In this paper, we show how PI control can be used for 
solving motion planning tasks on a team of quadrotors in 
real time. We combine periodic re-planning with receding 
horizon, similarly to model predictive control, with effi¬ 
cient importance sampling. At a high level, each quadro- 
tor is modelled as a point mass that follows simple dou¬ 
ble integrator dynamics. Low-level control is achieved us¬ 
ing a standard Proportional Integral Derivative (PID) veloc¬ 
ity controller that interacts with a real or simulated flight 
control system. With this strategy we can scale PI control to 
ten units in simulation. Although in principle there are no 
further limits to experiments with actual platforms, our first 
results with real quadrotors only include three units. To the 
best of our knowledge this has been the first real-time im¬ 
plementation of PI control on an actual multi-agent system. 

In the next section we describe related work. We introduce 
our approach in Section [3] Results are shown on three dif¬ 
ferent scenarios in Section]?] Finally, Section [5] concludes 
this paper. 

2 Related Work on UAV Planning and 
Control 

There is a large and growing body of literature related to 
this topic. In this section, we highlight some of the most 
related papers to the presented approach. An recent sur¬ 
vey of control methods for general UAVs can be found 
in [KendouT| ( [20 1 2 ) . 

Stochastic optimal control is mostly used for UAV con¬ 
trol in its simplest form, assuming a linear model perturbed 
by additive Gaus_sian_noise and subject to quadratic costs 
(LQG), e.g. ( [How et al. 2008| ). While LQG can successfully 
perform simple actions like hovering, executing more com¬ 
plex actions requires considering additional corrections for 
aerodynamic effects such as induced power or blade flap¬ 
ping ( Hoffmann et al. 2011| ). These approaches are mainly 
designed for accurate trajectory control and assume a given 
desired state trajectory that the controller transforms into 
motor commands. 

Model Predictive Control (MPC) has been used optimize 
trajectories in multi-agent UAV systems (Shim, Kim, and 
[Sastry 2003] ). MPC employs a model of the UAV and solves 


an optimal control problem at time t and state x(t) over a 
future horizon of a fixed number of time-steps. The first op¬ 
timal move u * (t ) is then applied and the rest of the optimal 
sequence is discarded. The process is repeated again at time 
t + 1. A quadratic cost function is typically used, but other 
more complex functions exist. 

MPC has mostly been used in indoor scenarios, where 
high-prec ision motion capture sys tems are available. For in¬ 
stance, in Kushle yev et al.| ( |2013| ) authors generate smooth 
trajectories through known 3-D environments satisfying 
specifications on intermediate waypoints and show remark¬ 
able success controlling a team of 20 quadrotors. Trajectory 
optimization is translated to a relaxation of a mixed inte¬ 
ger quadratic program problem with additional constraints 
for collision avoidance, that can be solved efficiently in real¬ 
time. Examples that follow a s imilar methodology can be 
found in|Turpin, Michael, and Kumar] [Augugliar o, Schoel 


|lig, and D’Andrea) ( [2012 [120121 7 Similarly to our approach, 

these methods use a simplified mod el of dynamics, either us- 
ing the 3-D position and yaw angle Kushleyev et al.[ [Turpin, 
Mic hael, and Kumar] (|2013[ |2012|) or the position and ve¬ 


locities as in Augugliaro, Schoellig, and D’ Andrea) ( |2012| ). 
However, these approaches are inherently deterministic and 
express the optimal control problem as a quadratic problem. 
In our case, we solve an inference problem by sampling and 
we do not require intermediate trajectory waypoints. 

In outdoor conditions, motion capture is difficult and 
Global Positioning System (GPS) is used instead. Existing 
control approaches are typically either based on Reynolds 
flocki ng ([Btirkle, Segor, and Kollmann 2011 1 jHauert et~aL 
|2011[ jVasarhelyi et al. 20 14[ [Reynolds 1987| or flight for- 
mationjfGuerrero "and Lozano 2012[ |Yu et al. 2013| ). In 

Reynolds flocking, each agent is considered a point mass 
that obeys simple and distributed rules: separate from neigh¬ 
bors, align with the average heading of neighbors and steer 
towards neighborhood centroid to keep cohesion. Flight for¬ 
mation control is typically modeled using graphs, where ev¬ 
ery node is an agent that can exchange information with all 
or several agents. Velocity and/or position coordination is 
usually achieved using consensus algorithms. 

The work in Quintero, Collins, and Hespanha (|2013| 
shares many similarities with our approach. Authors derive a 
stochastic optimal control formulation of the flocking prob¬ 
lem for fixed-wings UAVs. They take a leader-follower strat¬ 
egy, where the leader follows an arbitrary (predefined) pol¬ 
icy that is learned offline and define the immediate cost as 
a function of the distance and heading with respect to the 
leader. Their method is demonstrated outdoors with 3 fixed- 
wing UAVs in a distributed sensing task. As in this paper, 
they formulate a SOC problem and perform MPC. However, 
in our case we do not restrict to a leader-follower setup and 
consider a more general class of SOC problems which can 
include coordination and cooperation problems. 

Planning approaches Within the planning community, 
Bernardini, F ox, and Long| ( [20l4] ) consider search and track¬ 


ing tasks, similar to one of our scenarios. Their approach 
is different to ours, they formulate a planning problem that 
uses used search patterns that must be selected and se- 




















































































quenced to maximise the probability of rediscovering the 
target. | Albore et al. ( 2015| ) and Chanel, Te ichteil-Knigsbuch, 


|and Lesire| ( |2013| ) consider a different problem: dynamic 
data acquisition and environmental knowledge optimisation. 
Both techniques use some form of replanning. While |A1-| 
|bore et ah] ( |2015| ) uses a Markov Random Field framework 
to represent knowledge about the uncertain map and its qual¬ 
ity, Chanel, Teichteil-Knigsbuch, and Lesire ( |2013| ) rely on 
partially-observable MDPs. All these works consider a sin¬ 
gle UAV scenario and low-level control is either neglected 
or deferred to a PID or waypoint controller. 

Recent Progress in Path-Integral Control There has 
been significant progress in PI control, both theoretically and 
in applications. Most of existing methods use parametrized 
policies to overcome the main limitationsjsee Secti on |3.1|). 
Examples can be found in j Theodorou, Buchl i, and Sc haal; 
Stulp and Sigaud; [Gomez et al.| ( [2010} |2012[ |2014| . In 


these methods, the optimal control solution is restricted by 
the class of parametrized policies and, more importantly, 


it is computed offline. In Rawlik, Toussaint, and Vijayaku 
mar| ( |2013] ), authors propose to approximate the transformed 


cost-to-go function using linear operators in a reproducing 
kernel Hilbert space. Such an approach requires an analyt¬ 
ical form of the PI embedding, which is difficult to obtain 
in general. In Horowitz, Damle, and Burdick ( 2014| ), a low- 
rank tensor representation is used to represent the model dy¬ 
namics, allowing to scale PI control up to a 12-dimensional 
system. More recently, the issue of state-dependence of the 


optimal control has been addressed (Thijssen and Kappen 
|2015| ), where a parametrized state-dependent feedback con¬ 
troller is derived for the PI control class. 

Finally, model predictive PI control has been recently pro¬ 
posed for controlling a nano-quadrotor in indoor settings 
in an obstacle avoidance task ( [Williams, Rombokas, and| 
[Daniel 2014| . In contrast to our approach, their method is 
not hierachical and uses naive sampling, which makes it less 
sample efficient. Additionally, the control cost term is ne¬ 
glected, which can have important implications in complex 
tasks involving noise. The approach presented here scales 
well to several UAVs in outdoor conditions and is illustrated 
in tasks beyond obstacle avoidance navigation. 


3 Path-Integral Control for Multi-UAV 
planning 

We first briefly review PI control theory. This is followed by 
a description of the proposed method used to achieve motion 
planning of multi-agent UAV systems using PI control. 

3.1 Path-Integral Control 

We consider continuous time stochastic control problems, 
where the dynamics and cost are respectively linear and 
quadratic in the control input, but arbitrary in the state. 
More precisely, consider the following stochastic differential 
equation of the state vector x G M n under controls u G M m 

dx = f(x)d£ + G(x)(u dt + (1) 

where £ is m —dimensional Wiener noise with covariance 

G j and f ( x ) e M n and G ( X ) e R nxm afe ar _ 


bitrary functions, f is the drift in the uncontrolled dynamics 
(including gravity, Coriolis and centripetal forces), and G 
describes the effect of the control u into the state vector x. 

A realization r = xq^t of the above equation is called 
a (random) path. In order to describe a control problem we 
define the cost that is attributed to a path (cost-to-go) by 

5'('r|x 0 , u) = r T (x T ) 

+ E] (r t (x t )dt+ lujRutj dt, (2) 

t=0:dt:T—dt ' 

where r^(x T ) and r t (x t ) are arbitrary state cost terms at 
end and intermediate times, respectively. R is the control 
cost matrix. The general stochastic optimal control problem 
is to minimize the expected cost-to-go w.r.t. the control 

u* = arg minE[5(r|xo, u)]. 

u 

In general, such a minimization leads to the Hamilton- 
Jacobi-Bellman (HJB) equations, which are non-linear, sec¬ 
ond order partial differential equations. However, under the 
following relation between the control cost and noise co- 
variance = AR _1 , the resulting equation is linear in the 
exponentially transformed cost-to-go function. The solution 
is given by the Feynman-Kac Formula, which expresses op¬ 
timal control in terms of a Path-Integral, which can be in¬ 
terpreted as taking the expectation under the optimal path 
distribution ( [Kappen 20051 ) 

P*(t|x 0 ) ocp(r|xo,u)exp(-S'(r|xo,u)/A), (3) 
(u t *(x 0 )) = (u t + {it+dt ~ it)/dt), (4) 

where p(r |x 0 , u) denotes the probability of a (sub-optimal) 
path under equation Q and (•) denotes expectation over 
paths distributed by p*. 

The constraint = AR _1 forces control and noise to act 
in the same dimensions, but in an inverse relation. Thus, for 
fixed A, the larger the noise, the cheaper the control and vice- 
versa. Parameter A act as a temperature: higher values of A 
result in optimal solutions that are closer to the uncontrolled 
process. 

Equation 0 permits optimal control to be calculated by 
probabilistic inference methods, e.g., Monte Carlo. An in¬ 
teresting fact is that equations 00 hold for all controls 
u. In particular, u can be chosen to reduce the variance in 
the Monte Carlo computation of (u£(xo)) which amounts 
to importance sampling. This technique can drastically im¬ 
prove the sampling efficiency, which is crucial in high di¬ 
mensional systems. Despite this improvement, direct appli¬ 
cation of PI control into real systems is limited because it is 
not clear how to choose a proper importance sampling dis¬ 
tribution. Furthermore, note that equation ([4]) yields the op¬ 
timal control for all times t averaged over states. The result 
is therefore an open-loop controller that neglects the state- 
dependence of the control beyond the initial state. 

3.2 Multi-UAV planning 

The proposed architecture is composed of two main levels. 
At the most abstract level, the UAV is modeled as a 2D 
point-mass system that follows double integrator dynamics. 











































Algorithm 1 PI control for UAV motion planning 

1: function PICONTROLLER(iV, H , dt , r t (-), ) 

2: for fc = 1,..., N do 

3: Sample paths T k = {x t:dt:t+H } k with Eq. Q 

4: end for 

5: Compute Sk = 5(rfc|xo, u) with Eq. ^ 

6: Store the noise realizations {^t-.dt-.t+H} k 

7: Compute the weights: Wk = e ~ Sk ^ x / e~ Sl ^ x 

8: for s = t : dt : t + H do 

9: u* = u, + ^ Efc ({6+dt}fc - {&,}*) 

10: end for 

11: Return next desired velocity: Vt+dt = v t + u *dt 

and y^.d t . t+H for importance sampling at t + dt 

12: end function 


Flight Control System (1000Hz) 



Figure 2: The flight control system (FCS) is comprised of 
two control loops: one for stabilization and the other for pose 
control. A low-level controller interacts with the FCS over a 
serial interface to stream measurements and issue control. 


At the low-level, we use a detailed second order model that 
we learn from real flight data ( De Nardi and Holland 2008[ ). 
We use model predictive control combined with importance 
sampling. There are two main benefits of using the proposed 
approach: first, since the state is continuously updated, the 
controller does not suffer from the problems caused by us¬ 
ing an open-loop controller. Second, the control policy is not 
restricted by any parametrization. 

The two-level approach permits to transmit control sig¬ 
nals from the high-level PI controller to the low-level control 
system at a relatively low frequencies (we use 15Hz in this 
work). Consequently, the PI controller has more time avail¬ 
able for sampling a large number of trajectories, which is 
critical to obtain good estimates of the control. The choice of 
2D in the presented method is not a fundamental limitation, 
as long as double-integrator dynamics is used. The control 
hierarchy introduces additional model mismatch. However, 
as we show in the results later, this mismatch is not critical 
for obtaining good performance in real conditions. 

Ignoring height, the state vector x is thus composed of the 
East-North (EN) positions and EN velocities of each agent 
i = 1,..., M as Xi = \pi, Vi] T where pi, Vi G M 2 . Similarly, 
the control u consists of EN accelerations Ui G M 2 . Equa¬ 
tion O decouples between the agents and takes the linear 
form 


dxi = Axidt + B{uidt + 


A = 


0 1 
0 0 ’ 



(5) 


Notice that although the dynamics is decoupled and linear, 
the state cost rt(x t ) in equation 0 can be any arbitrary 
function of all UAVs states. As a result, the optimal con¬ 
trol will in general be a non-linear function that couples all 
the states and thus hard to compute. 

Given the current joint optimal action uj and velocity 
v t , the expected velocity at the next time t' is calculated as 
v t / »= vt + ( t' — t)u* and passed to the low-level controller. 
The final algorithm optionally keeps an importance-control 
sequence u t-.dt-.t+H that is incrementally updated. We sum¬ 
marize the high-level controller in Algorithm |T] 

The importance-control sequence u t-.dt-.t+H is initialized 
using prior knowledge or with zeros otherwise. Noise is 


dimension-independent, i.e. = cr 2 Id. To measure sam¬ 
pling convergence, we define the Effective Sample Size 
(ESS) as ESS := 1/ J2k=i w k> which is a quantity between 
1 and N. Values of ESS close to one indicate an estimate 
dominated by just one sample and a poor estimate of the 
optimal control, whereas an ESS close to N indicates near 
perfect sampling, which occurs when the importance- equals 
the optimal-control function. 


3.3 Low Level Control 

The target velocity v = [ve %] T is passed along with 
a height pu to a Velocity-Height controller. This con¬ 
troller uses the current state estimate of the real quadrotor 

y = [pePnPu fOfjuvwpqr] T , where {pe,Pn>Pu) 
and ( 0 , 0 ,^) denote navigation-frame position and orienta¬ 
tion and (u,v,w), (p,q,r) denote body-frame and angular 
velocities, respectively. It is composed of four independent 
PID controllers for roll 0, pitch 6, throttle 7 and yaw rate r. 
that send the commands to the flight control system (FCS) 
to achieve v. 

Figure [2] shows the details of the FCS. The control loop 
runs at 1kHz fusing triaxial gyroscope, accelerometer and 
magnetometer measurements. The accelerometer and mag¬ 
netometer measurements are used to determine a reference 
global orientation, which is in turn used to track the gyro¬ 
scope bias. The difference between the desired and actual 
angular rates are converted to motor speeds using the model 
in |Mahony, Kumar, and Corke ( j2012| ). 

An outer pose control loop calculates the desired angular 
rates based on the desired state. Orientation is obtained from 
the inner control loop, while position and velocity are ob¬ 
tained by fusing GPS navigation fixes with barometric pres¬ 
sure (BAR) based altitude measurements. The radio trans¬ 
mitter (marked TX in the diagram) allows the operator to 
switch quickly between autonomous and manual control of 
a platform. There is also an acoustic alarm on the platform 
itself, which warns the operator when the GPS signal is lost 
or the battery is getting low. If the battery reaches a critical 
level or communication with the transmitter is lost, the plat¬ 
form can be configured to land immediately or alternatively, 
to fly back and land at its take-off point. 












































Figure 3: Drunken Quadrotor: a red target has to be reached 
while avoiding obstacles. (Left) the shortest route is the op¬ 
timal solution in the absence of noise. (Right) with control 
noise, the optimal solution is to fly around the building. 


3.4 Simulator Platform 

We have developed an open-source framework called 
CRATE 4^ 1 The framework is a implementation of QRSim 
( |De Nardi 2013[ [Symington et al. 20l4} in Gazebo, which 
uses Robot Operating System (ROS) for high-level control. 
It permits high-level controllers to be platform-agnostic. It is 
similar to the Hector Quadrotor project |Meyer et al. 2012] ) 
with a formalized notion of a hardware abstraction layers. 

The CRATES simulator propagates the quadrotor state 
forward in time based on a second order model ( |De N ardi 
|and Holland 2008| ). The equations were learned from real 
flight data and verified by expert domain knowledge. In ad¬ 
dition to platform dynamics, CRATES also simulates var¬ 
ious noise-perturbed sensors, wind shear and turbulence. 
Orientation and barometric altitude errors follow zero-mean 
Ornstein-Uhlenbeck processes, while GPS error is modeled 
at the pseudo range level using trace data available from 
the International GPS Service. In accordance with the Mil¬ 
itary Specification MIL-F-8785C, wind shear is modeled as 
a function of altitude, while turbulence is modeled as a dis¬ 
crete implementation of the Dry den model. CRATES also 
provides support for generating terrain from satellite images 
and light detection and ranging (LIDAR) technology, and 
reporting collisions between platforms and terrain. 


4 Results 


We now analyze the performance of the proposed approach 
in three different tasks. We first show that, in the presence of 
control noise, PI control is preferable over other approaches. 
For clarity, this scenario is presented for one agent only. We 
then consider two tasks involving several units: a flight for¬ 
mation task and a pursuit-evasion task. 

We compare the PI control method described in Section 


T2| with iterative linear -quadratic Gaussian (iLQG) control 
( [Todorov and Li 2005| ). iLQG is a state-of-the-art method 
based on differential dynamic programming, that iteratively 
computes local linear-quadratic approximations to the finite 


1 CRATES stands for ’Cognitive Robotics Architecture for 

Tightly-Coupled Experiments and Simulation’. Available at 

https://bitbucket.org/vicengomez/crates 


horizon problem. A key difference between iLQG and PI 
control is that the linear-quadratic approximation is certainty 
equivalent. Consequently, iLQG yields a noise independent 
solution. 

4.1 Scenario I: Drunken Quadrotor 

This scenario is inspired in |Kappen| ( [2005| and highlights the 
benefits of SOC in a quadrotor task. The Drunken Quadro¬ 
tor is a finite horizon task where a quadrotor has to reach a 
target, while avoiding a building and a wall (figure [3]). There 
are two possible routes: a shorter one that passes through a 
small gap between the wall and the building, and a longer 
one that goes around the building. Unlike SOC, the deter¬ 
ministic optimal solution does not depend on the noise level 
and will always take the shorter route. However, with added 
noise, the risk of collision increases and thus the optimal 
noisy control is to take the longer route. 

This task can be alternatively addressed using other plan- 
ning methods, such as t he one proposed by |Ono, W illiams, 
and Blackmore ( |2013 ), which allow for specification of 
user’s acceptable levels of risk using chance constraints. 
Here we focus on comparing deterministic and stochastic 
optimal control for motion planning. The amount of noise 
thus determines whether the optimal solution is go through 
the risky path or the longer safer path. 

The state cost in this problem consists of hard constraints 
that assign infinite cost when either the wall or the build¬ 
ing is hit. PI control deals with collisions by killing par¬ 
ticles that hit the obstacles during Monte Carlo sampling. 
For iLQG, the local approximations require a twice differ¬ 
entiable cost function. We resolved this issue by adding a 
smooth obstacle-proximity penalty in the cost function. Al¬ 
though iLQG computes linear feedback, we tried to improve 
it with a MPC scheme, similar as for PI control. Unfortu¬ 
nately, this leads to numerical instabilities in this task, since 
the system disturbances tend to move the reference trajec¬ 
tory through a building when moving from one time step to 
the next. For MPC with PI control we use a receding horizon 
of three seconds and perform re-planning at a frequency of 
15 Hz with N = 2000 sample paths. Both methods are ini¬ 
tialized with u t = 0, Vt. iLQG requires approximately 10 1 * 3 
iterations to converge with a learning rate of 0.5%. 

Figure [3] (left) shows an example of real trajectory com¬ 
puted for low control noise level, g\ = 10 -3 . To be able 
to obtain such a trajectory we deactivate sensor uncertain¬ 
ties in accelerometer, gyroscope, orientation and altimeter. 
External noise is thus limited to aerodynamic turbulences 
only. In this case, both iLQG and PI solutions correspond to 
the shortest path, i.e. go through the gap between the wall 
and the building. Figure [3] (right) illustrates the solutions ob¬ 
tained for larger noise level — 1. While the optimal ref¬ 
erence trajectory obtained by iLQG does not change, which 
results in collision once the real noisy controller is executed 
(left path), the PI control solution avoids the building and 
takes the longer route (right path). Note that iLQG can find 
both solutions depending on initialization. However, How¬ 
ever, it will always choose the shortest route, regardless of 
nearby obstacles. Also, note that the PI controlled unit takes 
a longer route to reach the target. The reason is that the con- 
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Figure 4: Results: Drunken Quadrotor with wind: For different wind velocities and fixed control noise = 0.5. (Left) cost 
of the obtained solutions and (Right) percentage of crashes using iLQG and PI. 


trol cost R is set quite high in order to reach a good ESS. 
Alternatively, if R is decreased, the optimal solution could 
reach the target sooner, but at the cost of a decreased ESS. 
This trade-off, which is inherent in PI control, can be re¬ 
solved by incorporating feedback control in the importance 
sampling, as presented in Thijssen and Kappen ( |2015| ). 

We also consider more realistic conditions with noise not 
limited to act in the control. Figure [4] (a,b) shows results in 
the presence of wind and sensor uncertainty. Panel (a) shows 
how the wind affects the quality of the solution, resulting in 
an increase of the variance and the cost for stronger wind. 
In all our tests, iLQG is not able to bring the quadrotor to 
the other side. Panel (b) shows the percentage of crashes 
using both methods. Crashes occur often using iLQG con¬ 
trol and only occasionally using PI control. With stronger 
wind, the iLQG controlled unit does occasionally not even 
reach the corridor (the unit did not reach the other side but 
did not crash either). This explains the difference in percent¬ 
ages of Panel (b). We conclude that for multi-modal tasks 
(tasks where multiple solution trajectories exist), the pro¬ 
posed method is preferable to iLQG. 



Figure 5: Holding pattern in the CRATES simulator. Ten 
units coordinate their flight in a circular formation. In this 
example, N = 10 4 samples, control noise is = 0.1 and 
horizon H = 1 sec. Cost parameters are v mm = l,f max = 
3, C h it = 20 and d = 7. Environmental noise and sensing 
uncertainties are modeled using realistic parameter values. 


4.2 Scenario II: Holding Pattern 

The second scenario addresses the problem of coordinating 
agents to hold their position near a point of interest while 
keeping a safe range of velocities and avoiding crashing into 
each other. Such a problem arises for instance when multiple 
aircraft need to land at the same location, and simultaneous 
landing is not possible. The resulting flight formation has 
been used frequently in the literatu re (Vasarhelyi e t al. 2014} 
|How et al. 2008[ |Yu et al. 2013[ |Franchi et al. 2012| ), but 
always with prior specification of the trajectories. We show 
how this formation is obtained as the optimal solution of a 
SOC problem. 

Consider the following state cost (omitting time indexes) 

M 

rnv{x) = ^2 ex P ( v i - ^max) + exp (Vmin - Vj) 
i=1 

M 

+ exp (|| pi -d || 2 ) + ^ Chit/ || Pi-pj || 2 

j>i 

( 6 ) 


where v max and v mm denote the maximum and minimum ve¬ 
locities, respectively, d denotes penalty for deviation from 
the origin and Chit is the penalty for collision risk of two 
agents. || • ||2 denotes £-2 norm. 

The optimal solution for this problem is a circular flying 
pattern where units fly equidistantly from each other. The 
value of parameter d determines the radius and the average 
velocities of the agents are determined from v m \ n and u max . 
Since the solution is symmetric with respect to the direc¬ 
tion of rotation (clockwise or anti-clockwise), only when the 
control is executed, a choice is made and the symmetry is 
broken. Figure [5] shows a snapshot of a simulation after the 
flight formation has been reached for a particular choice of 
parameter values^] Since we use an uninformed initial con¬ 
trol trajectory, there is a transient period during which the 
agents organize to reach the optimal configuration. The co¬ 
ordinated circular pattern is obtained regardless of the initial 
positions. This behavior is robust and obtained for a large 
range of parameter values. 

2 Supplementary video material is available at http : //www. 
mbfys.ru.nl/staff/v.gomez/uav.html 
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Figure 6: Holding pattern: (a) evolution of the state cost for different number of samples N = 10,10 2 ,10 3 . (b) scaling of the 
method with the number of agents. For different control noise levels, (c) comparison between iLQG and PI control (ratios > 1 
indicate better performance of PI over iLQG) and (d) Effective Sample Sizes. Errors bars correspond to ten different random 
realizations. 


Figure[6ja) shows immediate costs at different times. Cost 
always decreases from the starting configuration until the 
formation is reached. This value depends on several param¬ 
eters. We report its dependence on the number N of sample 
paths. For large N, the variances are small and the cost at¬ 
tains small values at convergence. Conversely, for small N, 
there is larger variance and the obtained dynamical config¬ 
uration is less optimal (typically the distances between the 
agents are not the same). During the formation of the pat¬ 
tern the controls are more expensive. For this particular task, 
full convergence of the path integrals is not required, and the 
formation can be achieved with a very small N. 

Figure [6jb) illustrates how the method scales as the num¬ 
ber of agents increases. We report averages over the mean 
costs over 20 time-steps after one minute of flight. We var¬ 
ied M while fixing the rest of the parameters (the distance 
d which was set equal to the number of agents in meters). 
The small variance of the cost indicates that a stable forma¬ 
tion is reached in all the cases. As expected, larger values of 
N lead to smaller state cost configurations. For more than 
ten UAVs, the simulator starts to have problems in this task 
and occasional crashes may occur before the formation is 
reached due to limited sample sizes. This limitation can be 
addressed, for example, by using more processing power and 
parallelization and it is left for future work. 

We also compared our approach with iLQG in this sce¬ 
nario. Figure [6jc) shows the ratio of cost differences after 
convergence of both solutions. Both use MPC, with a hori¬ 
zon of 2s and update frequency of 15Hz. Values above 1 in¬ 
dicate that PI control consistently outperforms iLQG in this 
problem. Before convergence, we also found, as in the pre¬ 
vious task, that iLQG resulted in occasional crashes while PI 
control did not. The Effective Sample Size (ESS) is shown 
in Figure [6jd). We observe that higher control noise levels 
result in better exploration and thus better controls. We can 
thus conclude that the proposed methodology is feasible for 
coordinating a large team of quadrotors. 

For this task, we performed experiments with the real plat¬ 
forms. Figure [7] shows real trajectories obtained in outdoor 



Figure 7: Resulting trajectories of a Holding Pattern experi¬ 
ment using two platforms in outdoors conditions. 


conditions (see also the video that accompanies this paper 
for an experiment with three platforms). Despite the pres¬ 
ence of significant noise, the circular behavior was also ob¬ 
tained. In the real experiments, we used a Core il laptop with 
8GB RAM as base station, which run its own ROS messag¬ 
ing core and forwarded messages to and from the platforms 
over a IEEE 802.11 2.4GHz network. For safety reasons, the 
quadrotors were flown at different altitudes. 

4.3 Scenario III: Cat and Mouse 

The final scenario that we consider is the cat and mouse sce¬ 
nario. In this task, a team of M quadrotors (the cats) has 
to catch (get close to) another quadrotor (the mouse). The 
mouse has autonomous dynamics: it tries to escape the cats 
by moving at velocity inversely proportional to the distance 
to the cats. More precisely, let p m0U se denote the 2D posi¬ 
tion of the mouse, the velocity command for the mouse is 



















































Figure 8: Cat and mouse scenario: (Top-left) four cats and one mouse. (Top-right) for horizon time H = 2 seconds, the 
four cats surround the mouse forever and keep rotation around it. (Bottom-left) for horizon time H — 1 seconds, the four 
cats chase the mouse but (bottom-right) the mouse manages to escape. With these settings, the multi-agent system alternates 
between these two dynamical states. Number of sample paths is N = 10 4 , noise level cj\ = 0.5. Other parameter values are 

d = 30, v mm = 1, rVnax = 4, v mm = 4 and r? max mouse 3. 


computed (omitting time indexes) as 


^mouse 


= V max —- 

mouse y 


M 

where v = ^ 

i= 1 


Pi Amouse 
11 Pi Amouse 11 ^ 


The parameter v max determines the maximum velocity of 

mouse _ __ 

the mouse. As state cost function we use equation ([6]) with 
an additional penalty term that depends on the sum of the 
distances to the mouse 


M 

rCM(x ) = r H p(x) + ^ || Pi~ Pmouse lb • 
i=1 


This scenario leads to several interesting dynamical states. 
For example, for a sufficiently large value of M, the mouse 
always gets caught (if its initial position is not close to the 
boundary, determined by d). The optimal control for the cats 
consists in surrounding the mouse to prevent collision. Once 
the mouse is surrounded, the cats keep rotating around it, 
as in the previous scenario, but with the origin replaced by 
the mouse position. The additional video shows examples 
of other complex behaviors obtained for different parameter 
settings. Figure [5] (top-right) illustrates this behavior. 

The types of solution we observe are different for other 
parameter values. For example, for M = 2 or a small time 
horizon, e.g. H = 1, the dynamical state in which the cats 
rotate around the mouse is not stable, and the mouse escapes. 
This is displayed in Figure [8] (bottom panels) and better il¬ 
lustrated in the video provided as supplementary material. 
We emphasize that these different behaviors are observed 
for large uncertainty in the form of sensor noise and wind. 


5 Conclusions 

This paper presents a centralized, real-time stochastic opti¬ 
mal control algorithm for coordinating the actions of multi¬ 
ple autonomous vehicles in order to minimize a global cost 


function. The high-level control task is expressed as a Path- 
Integral control problem that can be solved using efficient 
sampling methods and real-time control is possible via the 
use of re-planning and model predictive control. To the best 
of our knowledge, this is the first real-time implementation 
of Path-Integral control on an actual multi-agent system. 

We have shown in a simple scenario (Drunken Quadro- 
tor) that the proposed methodology is more convenient than 
other approaches such as deterministic control or iLQG 
for planning trajectories. In more complex scenarios such 
as the Holding Pattern and the Cat and Mouse, the pro¬ 
posed methodology is also preferable and allows for real¬ 
time control. We observe multiple and complex group be¬ 
havior emerging from the specified cost function. Our ex¬ 
perimental framework CRATES has been a key development 
that permitted a smooth transition from the theory to the real 
quadrotor platforms, with literally no modification of the un¬ 
derlying control code. This gives evidence that the model 
mismatch caused by the use of a control hierarchy is not 
critical in normal outdoor conditions. Our current research 
is addressing the following aspects: 

Large scale parallel sampling— the presented method 
can be easily parallelized, for instance, using graphics pro¬ 
cessing units, as in Williams, Rombokas, and Daniel ( 2014) . 


Although the tasks considered in this work did not required 
more than 10 4 samples, we expect that this improvement 
will significantly increase the number of application do¬ 
mains and system size. 

Distributed control— we are exploring different dis¬ 
tributed formulations that take better profit of the factorized 
representation of the state cost. Note that the costs functions 
considered in this work only require pairwise couplings of 
the agents (to prevent collisions). However, full observabil¬ 
ity of the joint space is still required, which is not available 
in a fully distributed approach. 
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